16 research outputs found

    The degree of approximation of sets in euclidean space using sets with bounded Vapnik-Chervonenkis dimension

    Get PDF
    AbstractThe degree of approximation of infinite-dimensional function classes using finite n-dimensional manifolds has been the subject of a classical field of study in the area of mathematical approximation theory. In Ratsaby and Maiorov (1997), a new quantity Ļn(F, Lq) which measures the degree of approximation of a function class F by the best manifold Hn of pseudo-dimension less than or equal to n in the Lq-metric has been introduced. For sets F āŠ‚Rm it is defined as Ļn(F, lmq) = infHn dist(F, Hn), where dist(F, Hn) = supxĻµF infyĻµHnāˆ„xāˆ’y āˆ„lmq and Hn āŠ‚Rm is any set of VC-dimension less than or equal to n where n<m. It measures the degree of approximation of the set F by the optimal set Hn āŠ‚Rm of VC-dimension less than or equal to n in the lmq-metric. In this paper we compute Ļn(F, lmq) for F being the unit ball Bmp = {x Ļµ Rm : āˆ„xāˆ„lmpā©½ 1} for any 1 ā©½ p, q ā©½ āˆž, and for F being any subset of the boolean m-cube of size larger than 2mĪ³, for any 12 <Ī³< 1

    On the Value of Partial Information for Learning from Examples

    Get PDF
    AbstractThe PAC model of learning and its extension to real valued function classes provides a well-accepted theoretical framework for representing the problem of learning a target functiong(x) using a random sample {(xi,g(xi))}i=1m. Based on the uniform strong law of large numbers the PAC model establishes the sample complexity, i.e., the sample sizemwhich is sufficient for accurately estimating the target function to within high confidence. Often, in addition to a random sample, some form of prior knowledge is available about the target. It is intuitive that increasing the amount of information should have the same effect on the error as increasing the sample size. But quantitatively how does the rate of error with respect to increasing information compare to the rate of error with increasing sample size? To answer this we consider a new approach based on a combination of information-based complexity of Traubet al.and Vapnikā€“Chervonenkis (VC) theory. In contrast to VC-theory where function classes of finite pseudo-dimension are used only for statistical-based estimation, we let such classes play a dual role of functional estimation as well as approximation. This is captured in a newly introduced quantity, Ļd(F), which represents a nonlinear width of a function class F. We then extend the notion of thenth minimal radius of information and define a quantityIn,d(F) which measures the minimal approximation error of the worst-case targetgāˆˆ F by the family of function classes having pseudo-dimensiondgiven partial information ongconsisting of values taken bynlinear operators. The error rates are calculated which leads to a quantitative notion of the value of partial information for the paradigm of learning from examples

    Lower Bounds for Approximation by MLP Neural Networks

    No full text
    . The degree of approximation by a single hidden layer MLP model with n units in the hidden layer is bounded below by the degree of approximation by a linear combination of n ridge functions. We prove that there exists an analytic, strictly monotone, sigmoidal activation function for which this lower bound is essentially attained. We also prove, using this same activation function, that one can approximate arbitrarily well any continuous function on any compact domain by a two hidden layer MLP using a fixed finite number of units in each layer. Key Words. multilayer feedforward perceptron model, degree of approximation, lower bounds, Kolmogorov Superposition Theorem. x1. Introduction This paper is concerned with the multilayer feedforward perceptron (MLP) model. A lower bound on the degree to which the single hidden layer MLP model with n units in the hidden layer and a single output can approximate any function is given by the extent to which a linear combination of n ridge function..

    Approximation Bounds for Smooth Functions in C(R^d) by Neural and Mixture Networks

    No full text
    We consider the approximation of smooth multivariate functions in C(R^d) by feedforward neural networks with a single hidden layer of non-linear ridge functions. Under certain assumptions on the smoothness of the functions being approximated and on the activation functions in the neural network, we present upper bounds on the degree of approximation achieved over the domain R^d, thereby generalizing available results for compact domains. We extend the approximation results to the so-called mixture of expert architecture, which has received wide attention in recent years, showing that the same type of approximation bound may be achieved

    On the Value of Partial Information

    No full text
    We set up a theoretical framework for learning from examples and side information which enables us to compute the tradeoff between the sample complexity and information complexity for learning a target function in a Sobolev functional class F . We use the notion of the n th minimal radius of information of Traub et. al. [23] and combine it with VC-theory to define a new quantity I

    Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks

    No full text
    We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC dimension grows as W log W , where W is the number of parameters in the network. This result stands in opposition to the case where the number of layers is unbounded, in which case the VC dimension grows as W 2 . 1 MOTIVATION The VC dimension is an important measure of the complexity of a class of binaryvalued functions, since it characterizes the amount of data required for learning in the PAC setting (see [BEHW89, Vap82]). In this paper, we establish upper and lower bounds on the VC dimension of a specific class of multi-layered feedforward neural networks. Let F be the class of binary-valued functions computed by a feedforward neural network with W weights and k computational (non-input) units, each with a piecewise polynomial activation function. Goldberg and Jerrum [GJ95] have shown that..

    Error bounds for functional approximation and estimation using mixtures of experts

    No full text
    We examine some mathematical aspects of learning unknown mappings with the Mixture of Experts Model (MEM). Specifically, we observe that the MEM is at least as powerful as a class of neural networks, in a sense that will be made precise. Upper bounds on the approximation error are established for a wide class of target functions. The general theorem states that inf kf; f nk p c=n r=d holds uniformly for f 2 W r(L) (a Sobolev class over [;1 ļæ½ 1] p d), where fn belongs to an n-dimensional manifold of normalized ridge functions. The same bound holds for the MEM as a special case of the above. The stochastic error, in the context of learning from i.i.d. examples, is also examined. An asymptotic analysis establishes the limiting behavior of this error, in terms of certain pseudo-information matrices. These results substantiate the intuition behind the MEM, and motivate applications

    Error bounds for functional approximation and estimation using mixtures of experts

    No full text
    We examine some mathematical aspects of learning unknown mappings with the Mixture of Experts Model (MEM). Speci cally, we observe that the MEM is at least as powerful as a class of neural networks, in a sense that will be made precise. Upper bounds on the approximation error are established for a wide class of target functions. The general theorem states that kf; f nk p c=n r=d for f 2 W r p (L) (a Sobolev class over [;1 ļæ½ 1] d), and f n belongstoann-dimensional manifold of normalized ridge functions. The same bound holds for the MEM as a special case of the above. The stochastic error, in the context of learning from i.i.d. examples, is also examined. An asymptotic analysis establishes the limiting behavior of this error, in terms of certain pseudo-information matrices. These results substantiate the intuition behind the MEM, and motivate applications
    corecore